---
title: "The log-shipper module"
description: Collecting logs in the Kubernetes cluster using the log-shipper Deckhouse module. Log sending topologies, log filtering, and log metadata enrichment.
---

The module deploys log collector agents on nodes in the cluster. 
The purpose of these agents is to do minimal transformations and send logs further. 
Each agent is a [vector](https://vector.dev/) instance running with a configuration file generated by Deckhouse.

![log-shipper architecture](../../images/460-log-shipper/log_shipper_architecture.svg)
<!-- Source: https://docs.google.com/drawings/d/1cOm5emdfPqWp9NT1UrB__TTL31lw7oCgh0VicQH-ouc/edit -->

1. Deckhouse is watching [ClusterLoggingConfig](cr.html#clusterloggingconfig), [ClusterLogDestination](cr.html#clusterlogdestination) and [PodLoggingConfig](cr.html#podloggingconfig) custom resources.
  The combination of a logging source and log destination is called `pipeline`.
2. Deckhouse generates a configuration file and stores it into Kubernetes `Secret`.
3. `Secret` is mounted to all log-shipper agent Pods and the configuration is reloaded on changes by the `reloader` sidecar container.

## Deployment topologies

This module deploys only agents on nodes. However, it is implied that logs are sent from the cluster using one of the following topologies.

### Distributed

Agents send logs directly to the storage, e.g., Loki, Elasticsearch.

![log-shipper distributed](../../images/460-log-shipper/log_shipper_distributed.svg)
<!-- Source: https://docs.google.com/drawings/d/1FFuPgpDHUGRdkMgpVWXxUXvfZTsasUhEh8XNz7JuCTQ/edit -->

* Less complicated scheme to use.
* Available out of the box without any external dependency besides storage.
* Complicated transformations consume more resources.

### Centralized

All logs are aggregated by one of the available aggregation destinations, e.g., Logstash, Vector.
Agents on nodes do minimal transformations and try to send logs from nodes faster with less resource consumption.
Complicated mappings are applied on the aggregator's side.

![log-shipper centralized](../../images/460-log-shipper/log_shipper_centralized.svg)
<!-- Source: https://docs.google.com/drawings/d/1TL-YUBk0CKSJuKtRVV44M9bnYMq6G8FpNRjxGxfeAhQ/edit -->

* Fewer resources are used on worker nodes.
* Users can configure any possible mappings for aggregators and send logs to many more storages.
* Dedicated nodes for aggregates can be scaled up and down on loading changes.

### Stream

The main goal of this architecture is to send messages to the queue system as quickly as possible, then other workers will read them and deliver them to the long-term storage for later analysis.

![log-shipper stream](../../images/460-log-shipper/log_shipper_stream.svg)
<!-- Source: https://docs.google.com/drawings/d/1R7vbJPl93DZPdrkSWNGfUOh0sWEAKnCfGkXOvRvK3mQ/edit -->

* The same pros and cons as for centralized architecture, yet one more middle layer storage is added.
* Increased durability. Suites for all infrastructures where logs delivery is crucial.

## Metadata

On collecting, all sources enrich logs with metadata. The enrichment takes place at the `Source` stage.

### Kubernetes

The following metadata fields will be exposed:

| Label        | Pod spec path           |
|--------------|-------------------------|
| `pod`        | metadata.name           |
| `namespace`  | metadata.namespace      |
| `pod_labels` | metadata.labels         |
| `pod_ip`     | status.podIP            |
| `image`      | spec.containers[].image |
| `container`  | spec.containers[].name  |
| `node`       | spec.nodeName           |
| `pod_owner`  | metadata.ownerRef[0]    |

| Label        | Node spec path                            |
|--------------|-------------------------------------------|
| `node_group` | metadata.labels[].node.deckhouse.io/group |

{% alert -%}
Splunk destination does not use `pod_labels`, because it is a nested object with keys and values.
{%- endalert %}

### File

The only exposed label is `host`, which is equal to a node hostname.

## Log filters

There are a couple of filters to reduce the number of lines sent to the destination — `log filter` and `label filter`.

![log-shipper pipeline](../../images/460-log-shipper/log_shipper_pipeline.svg)
<!-- Source: https://docs.google.com/drawings/d/1SnC29zf4Tse4vlW_wfzhggAeTDY2o9wx9nWAZa_A6RM/edit -->

They are executed right after concatenating lines together with the multiline log parser.

1. `label filter` — rules are executed against the metadata of a message. Fields in metadata (or labels) come from a source, so for different sources, we will have different fields for filtering. These rules are useful, for example, to drop messages from a particular container and for Pods with/without a label.
2. `log filter` — rules are executed against a message. It is possible to drop messages based on their JSON fields or, if a message is not JSON-formatted, use regex to exclude lines.

Both filters have the same structured configuration:
* `field` — the source of data to filter (most of the time it is a value of a label or a JSON field).
* `operator` — action to apply to a value of the field. Possible options are In, NotIn, Regex, NotRegex, Exists, DoesNotExist.
* `values` — this option has a different meanings for different operations:
  * DoesNotExist, Exists — not supported;
  * In, NotIn — a value of a field must / mustn't be in the list of provided values;
  * Regex, NotRegex — a value of a field must match any or mustn't match all the provided regexes (values).

You can find examples in the [Examples](examples.html) section of the documentation.

{% alert -%}
Extra labels are added on the `Destination` stage of the pipeline, so it is impossible to run queries against them.
{%- endalert %}
